7.4 Compression

83

that the probabilities are respectively one half comma one fourth comma one eighth 1

2, 1

4, 1

8, and one eighth 1

8. Then, from Eq. (6.5) we deter-

mineupper I equals 1.75I = 1.75 bits per symbol, so we should be able to encode the message (whose

relative entropy is seven eighths 7

8 and hence redundancy upper RR is one eighth 1

8) such that a smaller channel will

suffice to send it. The following code may be used: 8

A

B

C

D

0

10

110

111

.

The average number of binary digits used in encoding a sequence ofupper NN symbols will

beupper N left parenthesis one half times 1 plus one fourth times 2 plus two eighths times 3 right parenthesis equals seven fourths upper NN( 1

2 × 1 + 1

4 × 2 + 2

8 × 3) = 7

4 N. 0 and 1 can be seen to have equal probabilities;

hence, upper II for the coded sequence is 1 bit/symbol, equivalent to 1.75 binary symbols

per original letter. The binary sequence can be decoded by the transformation

00

01

10

11

A,

B,

C,

D,

The compression ratio of this process is seven eighths 7

8. Note, however, that there is no general

method for finding the optimal coding.

Problem. Using the above coding, show that the 16-letter message “ABBAAAD-

ABACCDAAB” can be sent using only 14 letters.

The Shannon technique requires a long delay between receiving symbols for

encoding and the actual encoding, in order to accumulate sufficiently accurate indi-

vidual symbol transmission probabilities. The entire message is then encoded. This

is, of course, a highly impractical procedure. Mandelbrot (1952) has devised a proce-

dure whereby messages are encoded word by word. In this case the word delimiters

(e.g., spaces in English text) play a crucial rôle. From Shannon’s viewpoint, such a

code is necessarily redundant, but on the other hand, an error in a single word renders

only that word unintelligible, not the whole message. It also avoids the necessity for

a long delay before coding can begin.

The Mandelbrot coding scheme has interesting statistical properties. One may

presume that the encoder seeks to minimize the cost of conveying a certain amount

of information using the collection of words that are at his disposal. If p Subscript ipi is the

probability of selecting and transmitting theiith word, then the mean information per

symbol contained in the message is, as before, minus sigma summation p Subscript i Baseline log p Subscript iΣ pi log pi. We may suppose that

the cost of transmitting a selected word is proportional to its length. Ifc Subscript ici is the cost of

transmitting theiith word, then the average cost per word issigma summation p Subscript i Baseline c Subscript iΣ pici. Minimizing the

distribution of the probabilities while keeping the total information constant (using

Lagrange’s method of undetermined multipliers) yields

p Subscript i Baseline equals upper C e Superscript minus upper D c Super Subscript i Superscript Baseline commapi = CeDci ,

(7.4)

8 Elaborated by D. A. Huffman.